Summary of workshop on lexicons for text extraction
نویسنده
چکیده
The POETIC Lexicon Lynne Cahill, of the University of Sussex, England, presented the design an d specifications for the lexicon used in their Traffic Information Collator system, and what problem s they encountered in porting it to the MUC task . This is an information extraction system used by local police personnel for traffic reports . The domain is characterized by a fairly tight and limite d vocabulary, as well as a telegraphic style of syntax . Cahill discussed the relative ease with which th e lexicon was adapted to new police force domains . The general issues raised in Cahill's presentation were the problem of going from a sublanguag e lexicon to a broader lexicon, as required for the MUC-5 English Joint Venture domain . The portin g effort took 12 person months in 6 months time . The MUC-5 lexicon design consists of a domain specific lexicon and a phrasal lexicon . These were used in conjunction with the Alvey Natural Language Toolkit for parse recovery. Rich lexical information was added only to words which were significant in the domain as triggers for the template fills . Furthermore, the recognition of company and personal names was accomplished by standar d pattern matching techniques . Cahill discussed the different nature of the lexical entries in the two domains . Porting to MUC5 required a new semantics and much more syntax. The result was that the incomplete lexicon gave rise to undergeneration of appropriate template objects, while fragmentary parsing resulted in templat e overgeneration, because of the liberal acceptance of too many patterns . There was, furthermore, no contextual feedback into the parser, as well as no way of selecting the most likely analysis of a give n pattern, if several fired . Cahill pointed out that these problems were largely due to time constraint s in the development cycle, rather than the nature of the lexical design .
منابع مشابه
An Evaluation of the Concept Retrieval Annotation for Spanish-English CLEFER Parallel Corpora
This paper presents a study about the use of the concept retrieval annotation method for parallel corpora. The concept retrieval annotation method (CRA) consists of considering concepts as documents and text chunks as queries [1]. Concepts with higher similarity to text chunks are considered for generating the final semantic annotation. CRA makes use of an existing knowledge resource (KR) from ...
متن کاملHow to Work Collaboratively Within the Health System: Workshop Summary and Facilitator Reflection
Effectiveness in health services research requires development of specific knowledge and skills for working in partnership with health system decision-makers. In an initial effort to frame capacity-building activities for researchers, we designed a workshop on working collaboratively within the health system. The workshop, based on recent research exploring health syste...
متن کاملSummary of the BioLINK SIG 2013 meeting at ISMB/ECCB 2013
UNLABELLED The ISMB Special Interest Group on Linking Literature, Information and Knowledge for Biology (BioLINK) organized a one-day workshop at ISMB/ECCB 2013 in Berlin, Germany. The theme of the workshop was 'Roles for text mining in biomedical knowledge discovery and translational medicine'. This summary reviews the outcomes of the workshop. Meeting themes included concept annotation method...
متن کاملSummary of the BioLINK SIG
The ISMB Special Interest Group on Linking Literature, Information and Knowledge for Biology (BioLINK) organized a one-day workshop at ISMB/ECCB 2013 in Berlin, Germany. The theme of the workshop was “Roles for text mining in biomedical knowledge discovery and translational medicine”. This summary reviews the outcomes of the workshop. Meeting themes included concept annotation methods and appli...
متن کاملDeepNNNER: Applying BLSTM-CNNs and Extended Lexicons to Named Entity Recognition in Tweets
In this paper, we describe the DeepNNNER entry to The 2nd Workshop on Noisy User-generated Text (WNUT) Shared Task #2: Named Entity Recognition in Twitter. Our shared task submission adopts the bidirectional LSTM-CNN model of Chiu and Nichols (2016), as it has been shown to perform well on both newswire and Web texts. It uses word embeddings trained on large-scale Web text collections together ...
متن کاملThe Impact of Summary Writing with Structure Guidelines on EFL College Students’ Rhetorical Organization: Integrating Genre-Based and Process Approaches
This study aimed at investigating the impact of writing on Iranian EFL college students’ rhetorical organization. Thirty Iranian female undergraduate students majoring in English at Al-zahra University participated in the current study. The writing instructions included two stages, each lasting for four weeks. The participants were assigned to a control group and an experimental group according...
متن کامل